Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5889] feat(client-python): Add model management Python API #6009

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes to add Python client API for model management.

Why are the changes needed?

This is part of work to support model management in Gravitino.

Fix: #5889

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT added.

@jerryshao jerryshao self-assigned this Dec 26, 2024
Comment on lines +90 to +93
resp = self.rest_client.get(
self._format_model_request_path(model_full_ns),
error_handler=MODEL_ERROR_HANDLER,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds a little bit dangerous if the GET doesn't work as expected, or the return is not application/json.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think such kind of exceptions/errors is handled by rest_client, we don't need to handle it here in this class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

"""

@abstractmethod
def version(self) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to use string as version number for generic scenarios.
If this means an internal representation, maybe we want to call it something like revision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current design, version is a self-incremented integer version number, in the meantime, we also have "aliases" for the convenience of use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

except NoSuchModelVersionException:
return False

def get_model_version_by_alias(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_model_alias for consistent method naming style?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that the name may be misleading, users may feel that this method is to get the alias, not the model version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was comparing this to the get_model_version method, which can be interpreted as "getting a model by its specific version".
If that is true, we can name this to get_model_alias.
After revisiting this collection of methods, I see what this method was designed for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me clarify the concept of model management in Gravitino. "Model" is a concept mapping to an ML model, "Model Version" is a snapshot of the specific "Model", one "Model" can have multiple "Model Versions". Besides, one "Model Version" can have multiple aliases, so users can use either "version" or "alias" to get a specific "Model Version".

So here, get_model_version means get a specific model version entity by version number, and get_model_version means get a specific model version entity by alias. It is not getting a model, it is getting a model version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Now I think I get the intention behind the "model version" thing and I know where all these come from.

TBH, the concept of "model version" as an entity was not so straightforward to me,
because "version" in my mind was not supposed to be treated as first-class entities.
A "version" has to be associated with a model, it is not supposed to managed separately. The "version" value can be associated with more than one model,
if we are using this term to represent a semantic version.
I did get that the "version" in this context has nothing to do with semantic
version, it is actually a sequence number, used to form a unique key for an entity.

Now we have a dilemma. We still need an attribute for "version" in the traditional
sense. So we invented a new term "alias", which further complicates the conceptual
design as a whole. Can/shall we promote "alias" to be a version string?

self.link_model_version(ident, uri, aliases, comment, properties)
return model

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer moving all static methods out of the class considering that they are simply utility functions.

Comment on lines +30 to +34
_name: str = field(metadata=config(field_name="name"))
_comment: Optional[str] = field(metadata=config(field_name="comment"))
_properties: Optional[Dict[str, str]] = field(
metadata=config(field_name="properties")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These class properties seem useless?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're mainly used for json serialization.

@@ -34,13 +33,13 @@ def __init__(self, levels: List[str]):
self._levels = levels

def to_json(self):
return json.dumps(self._levels)
return self._levels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this changes?

Copy link
Contributor Author

@jerryshao jerryshao Dec 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bug in the previous code, it treats a json list as a string, which will lead to failure in deserialization.

Raises:
IllegalArgumentException if the request is invalid
"""
if not self._uri:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we have checked the validness of alias, it seems that the same logic is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Implement the Python client API for model catalog.
3 participants